查看原文
其他

卷积神经网络反向传播推导

2016-05-04 西西 量化投资与机器学习

查看之前文章请点击右上角,关注并且查看历史消息,还可以在文章最后评论留言。谢谢您的支持!


Disclaimer: It is assumed that the reader is familiar with terms such as Multilayer Perceptron, delta errors or backpropagation. If not,  it is recommended to read for example a  of free online book ‘Neural Networks and Deep Learning’ by .   


Convolutional Neural Networks (CNN) are now a standard way of image classification – there are publicly accessible deep learning frameworks, trained models and services. It’s more time consuming to install stuff like  than to perform state-of-the-art object classification or detection. We also have many methods of getting knowledge -there is a large number of /or even direct ways of accessing to the strongest Deep/Machine Learning minds such as or  by Quora, Facebook or G+.


Nevertheless, when I wanted to get deeper insight in CNN, I could not find a “CNN backpropagation for dummies”. Notoriously I met with statements like:  “If you understand backpropagation in standard neural networks, there should not be a problem with understanding it in CNN” or “All things are nearly the same, except matrix multiplications are replaced by convolutions”. And of course I saw tons of ready equations.


It was a little consoling, when I found out that I am not alone, for example: 

The answer on above question, that concerns the need of rotation on weights in gradient computing, will be a result of this long post.

We start from multilayer perceptron and counting delta errors on fingers:

We see on above picture that  is proportional to deltas from next layer that are scaled by weights.


But how do we connect concept of MLP with Convolutional Neural Network? Let’s play with MLP:

If you are not sure that after connections cutting and weights sharing we get one layer Convolutional Neural Network, I hope that below picture will convince you:

The idea behind this figure is to show, that such neural network configuration  is identical with a 2D convolution operation and weights are just filters (also called kernels, convolution matrices, or masks).

Now we can come back to gradient computing by counting on fingers, but from now we will be only focused on CNN. Let’s begin:

No magic here, we have just summed in “blue layer” scaled by weights gradients from “orange” layer. Same process as in MLP’s backpropagation. However, in the standard approach we talk about dot products and here we have … yup, again convolution:


Yeah, it is a bit different convolution than in previous (forward) case. There we did so called valid convolution, while here we do a full convolution (more about nomenclature ). What is more, we rotate our kernel by 180 degrees. But still, we are talking about convolution!

Now, I have some good news and some bad news:

  1. you see (BTW, sorry for pictures aesthetics :) ), that matrix dot products are replaced by convolution operations both in feed forward and backpropagation.

  2. you know that seeing something and understanding something … yup, we are going now to get our hands dirty and prove above statement <fn> before getting next, I recommend to read, mentioned already in the disclaimer,  of M. Nielsen book. I tried to make all quantities to be consistent with work of Michael. 

In the standard MLP, we can define an error of neuron j as:

where  is just:

and for clarity,  , where  is an activation function such as sigmoid, hyperbolic tangent or .

But here, we do not have MLP but CNN and matrix multiplications are replaced by convolutions as we discussed before. So instead of   we do have a :

Above equation is just a convolution operation during feedforward phase illustrated in the above picture titled 

Now we can get to the point and answer the question 

We start from statement:

We know that  is in relation to  which is indirectly showed in  the above picture titled . So sums are the result of chain rule. Let’s move on:

First term is replaced by definition of error, while second has become large because we put it here expression on . However, we do not have to fear of this big monster – all components of sums equal 0, except these ones that are indexed:  and . So:

If  and  then it is obvious that  and  so we can reformulate above equation to:

 

OK, our last equation is just …

 

Where is the rotation of weights? Actually .

So the answer on question  is simple: the rotation of the weights just results from derivation of delta error in Convolution Neural Network.

OK, we are really close to the end. One more ingredient of backpropagation algorithm is update of weights :

So paraphrasing for CNN:

  1. Input x: set the corresponding activation  for the input layer.

  2. Feedforward: for each l = 2,3, …,L compute  and 

  3. Output error : Compute the vector 

  4. Backpropagate the error: For each l=L-1,L-2,…,2 compute 

  5. Output: The gradient of the cost function is given by 

【过往文章】

1.【机器学习课程】深度学习与神经网络系列之绪论介绍

2.【Python机器学习】系列之线性回归篇【深度详细】

3.多因子策略系列(一)——因子回溯测试的总体框架

4.Python机器学习:数据拟合与广义线性回归

5.【分级基金】之分级A的隐含收益率研究分析

6.【精华干货】Quant 需要哪些 Python 知识

7.【干货】量化投资国内外很棒的论坛网站

8.朴素贝叶斯模型(NBM)详解与在Matlab和Python里的具体应用

9.机器学习的前期入门汇总

10.【深度原创研究】分级基金下折全攻略(一)

11.【深度原创研究】分级基金下折全攻略(二)

12.【知识食粮】最新华尔街牛人必读书籍排行

13.通过 MATLAB 处理大数据

14.【扎实资料干货分享】Python、研究报告、计量经济学、投资书籍、R语言等!(Book+Video)

15.机器学习在统计套利中的应用

16.量化投资修行之路

17.统计套利在股指期货跨期套利中的应用:基于协整方法的估计

18.股指期货跨品种套利交易

19.沪港通股票统计套利:基于BP神经网络

20.机器学习到底在量化金融里哪些方面有应用?

21.【Matlab机器学习】之图像识别

22.【干货分享】Python数据结构与算法设计总结篇

23.基于Python的股票数据接口调用代码实例

24.基于Python爬取腾讯网的最热评论代码实例


量化投资与机器学习

知识、能力、深度、专业

勤奋、天赋、耐得住寂寞


您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存